FDM: a graph-based statistical method to detect differential transcription using RNA-seq data
نویسندگان
چکیده
MOTIVATION In eukaryotic cells, alternative splicing expands the diversity of RNA transcripts and plays an important role in tissue-specific differentiation, and can be misregulated in disease. To understand these processes, there is a great need for methods to detect differential transcription between samples. Our focus is on samples observed using short-read RNA sequencing (RNA-seq). METHODS We characterize differential transcription between two samples as the difference in the relative abundance of the transcript isoforms present in the samples. The magnitude of differential transcription of a gene between two samples can be measured by the square root of the Jensen Shannon Divergence (JSD*) between the gene's transcript abundance vectors in each sample. We define a weighted splice-graph representation of RNA-seq data, summarizing in compact form the alignment of RNA-seq reads to a reference genome. The flow difference metric (FDM) identifies regions of differential RNA transcript expression between pairs of splice graphs, without need for an underlying gene model or catalog of transcripts. We present a novel non-parametric statistical test between splice graphs to assess the significance of differential transcription, and extend it to group-wise comparison incorporating sample replicates. RESULTS Using simulated RNA-seq data consisting of four technical replicates of two samples with varying transcription between genes, we show that (i) the FDM is highly correlated with JSD* (r=0.82) when average RNA-seq coverage of the transcripts is sufficiently deep; and (ii) the FDM is able to identify 90% of genes with differential transcription when JSD* >0.28 and coverage >7. This represents higher sensitivity than Cufflinks (without annotations) and rDiff (MMD), which respectively identified 69 and 49% of the genes in this region as differential transcribed. Using annotations identifying the transcripts, Cufflinks was able to identify 86% of the genes in this region as differentially transcribed. Using experimental data consisting of four replicates each for two cancer cell lines (MCF7 and SUM102), FDM identified 1425 genes as significantly different in transcription. Subsequent study of the samples using quantitative real time polymerase chain reaction (qRT-PCR) of several differential transcription sites identified by FDM, confirmed significant differences at these sites. AVAILABILITY http://csbio-linux001.cs.unc.edu/nextgen/software/FDM CONTACT: [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملMMDiff: Statistical Testing for ChIP-Seq data sets
ChIP-Seq has rapidly become the dominant experimental technique to determine the location of transcription factor binding sites and histone modifications. Typically, computational peak finders, such as Macs [Zhang et al., 2008], are used to identify potential candidate regions, i.e. regions with significantly enriched read coverage compared to some background. In the following, we will simply c...
متن کاملInvestigating the Function of Predicted Proteins from RNA-Seq Data in Holstein and Cholistani Cattle Breeds
This study was performed to determine the digital expression profile of different genes expressed in Holstein and Cholistani breeds as well as to evaluate the performance of predicted proteins derived from differentially expressed genes between these two breeds using RNA-Seq data. For this purpose, the whole mRNA sequence for a blood sample of American Holstein and Pakistani Cholistani cattle p...
متن کاملDiffSplice: the genome-wide detection of differential splicing events with RNA-seq
The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and development and enable the identification of bi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 27 شماره
صفحات -
تاریخ انتشار 2011